A new regional cotton growth model based on reference crop evapotranspiration for predicting growth processes

Meteorological conditions and irrigation amounts are key factors that affect crop growth processes. Typically, crop growth and development are modeled as a function of time or growing degree days (GDD). Although the most important component of GDD is temperature, it can vary significantly year to year while also gradually shifting due to climate changes. However, cotton is highly sensitive to various meteorological factors, and reference crop evapotranspiration (ETO) integrates the primary meteorological factors responsible for global dryland extension and aridity changes. This paper constructs a cotton growth model using ETO, which improves the accuracy of crop growth simulation. Two cotton growth models based on the logistic model established using GDD or ETO as independent factors are evaluated in this paper. Additionally, this paper examines mathematical models that relate irrigation amount and irrigation water utilization efficiency (IWUE) to the maximum leaf area index (LAImax) and cotton yield, revealing some key findings. First, the model using cumulative reference crop evapotranspiration (CETO) as the independent variable is more accurate than the one using cumulative growing degree days. To better reflect the effects of meteorological conditions on cotton growth, this paper recommends using CETO as the independent variable to establish cotton growth models. Secondly, the maximum cotton yield is 7171.7 kg/ha when LAImax is 6.043 cm2/cm2, the corresponding required irrigation amount is 518.793 mm, and IWUE is 21.153 kg/(ha·mm). Future studies should consider multiple associated meteorological factors and use ETO crop growth models to simulate and predict crop growth and yield.

With rapid population growth in recent decades, the demand for grain and cash crops has been constantly increasing. To improve crop yield and quality, it is necessary to accurately describe crop growth processes and better control the nutritional factors required for crop growth. A precise understanding of the complex interactions between crops and their surrounding environments is important to predict how environmental conditions affect crop growth processes. Towards this goal, crop growth models have been developed to simulate crop growth and development using complex mathematical functions and modelling techniques 1 . These models can quantitatively and dynamically describe crop growth development and yield formation processes 2 , which is useful for assessing the impact of drought on future crop yields. Additionally, crop growth models can provide detailed estimations of crop status, including phenological status, leaf area index (LAI), and yield of specific crop types 3 . Furthermore, these models can predict crop yields as a function of soil conditions, weather, and management practices 4 . Therefore, crop growth models have become important tools for quantitatively evaluating the relationships among soil, weather, and vegetation, and for facilitating the timely regulation of crop growth, which has attracted widespread attention.
Crop growth models have simple forms and are convenient to use 4 , and can be employed to simulate crop growth processes to guide agricultural production. Some models have been adapted to crop breeding to simulate the effects of changes in morphological and physiological characteristics of crops, which helps to identify optimal phenotypes in different environments 5 . In addition, some models have been used to simulate crop growth processes, including the Gompertz 6 , Richards 7 , and logistic 8,9 models. Pronk 10 established a dynamic model of dry matter accumulation (D) and LAI based on the relationship between wheat and radiation-use efficiency,

Materials and methods
Data sources. The cotton growth index data used in this study were obtained from previously published studies on cotton growth in China. The portion of the data used for analyzing the characteristics of cotton growth index in Xinjiang came from Wang 24 , and data from other regions are summarized in Table 1. The main cotton planting areas in China are concentrated in Central China, North China, and the Xinjiang region (Fig. 1). In Central China, the climate is characterized by cold winters and hot summers, with frequent drought and flooding. North China has a warm-temperate monsoon climate, with significant seasonal differences in temperature, precipitation, and evaporation. In Xinjiang, the climate is characterized by large temperature ranges throughout the year, low precipitation, and a dry continental climate. The sowing times for different varieties of cotton in different regions range from mid-April to mid-May, with harvest times from late September to late October.
Meteorological data were collected from the National Meteorological Information Center of the China Meteorological Data Service Center (http:// data. cma. cn/) and the ERA5 hourly data on single levels from 1979 to present (https:// doi. org/ 10. 24381/ cds. adbb2 d47). The data were used to analyze the effects of meteorological conditions on regional cotton growth. Specifically, we collected data on temperature, solar radiation, humidity, wind speed, and precipitation. To collect cotton growth index data, we used the GetData Graph Digitizer to extract data directly from text or figures in published studies. More than three sets of data samples were selected from most regions, but a few areas had only 1-3 sets of data samples. Table 1  www.nature.com/scientificreports/ Data processing method and error analysis. All data were processed in Microsoft Office Excel (Microsoft Corporation) and MATLAB (MathWorks Inc., Natick, MA, USA) was used for model parameter calculation. ArcMap 10.5 was employed to generate the map of the study area. To address the potential errors or bad numbers that may have arisen during the data digitization process, we took the following methods to minimize their impact on our analysis: i) We carefully reviewed all digitized data points to check for any obvious outliers or inconsistencies, and corrected any that were identified. ii) We compared the digitized data with the original data source and conducted three repetitions of the same digitized data to ensure that there were no significant differences. Correlations were assessed using the R 2 , and the accuracy was evaluated through RMSE and RE, as follows: where, x i is the independent variable; y i is the dependent variable; x and y represent the average values of x i and y i , respectively; m vi is the measured value; and c vi is the calculated value.

Mathematical model development and construction
Cotton growth model based on cumulative growing degree days. Every crop has biologically defined upper and lower temperature limits beyond which its growth and development can be adversely affected. For cotton, the upper and lower temperature limits are 40 ℃ and 10 ℃, respectively, as demonstrated by numer- www.nature.com/scientificreports/ ous studies 24,[45][46][47] . Growing degree days (GDD) are a measure of heat accumulation used to estimate crop development and growth. In this study, we converted daily GDD data to cumulative GDD (CGDD) using the following equation: where, CGDD is the cumulative growing degree days, ℃; T avg is the mean daily temperature, ℃; and T base is the minimum daily temperature required for crop activity, ℃. McMaster and Wilhelm proposed a method for calculating T avg 48 : where, T upper is the maximum temperature at which crop activities continue, ℃; T x is the highest daily temperature, ℃; and T n is the lowest daily temperature, ℃.
A logistic model describing plant height (H), leaf area index (LAI), and dry matter accumulation (D) of a crop using CGDD as the independent variable was established. Equations to calculate the relative change of indicators of H, LAI, and D are used to minimize the differences in cotton growth indicators caused by irrigation methods or soil conditions in different regions. The relative cotton growth index and CGDD are obtained using these corrected indicators. The logistic models between the relative CGDD (R CGDD ) and relative H, relative LAI, and relative D, are expressed as: where, R H is the relative plant height; H is the plant height, cm; H max is the theoretical maximum H, cm; R LAI is relative leaf area index; LAI is leaf area index, cm 2 /cm 2 ; LAI max is the theoretical maximum LAI, cm 2 /cm 2 ; R D is relative dry matter accumulation; D is dry matter accumulation, g/plant; D max is the theoretical maximum D, g/plant; R GDD is the relative CGDD; and a 1 , a 2 , a 3 , b 1 , b 2 , b 3 , and c 2 are model fitting parameters. The maximum measured value of each index in a field experiment may not actually reach the maximum value, so we increased the values using multiplication factors (for example, H max was multiplied by an incremental factor of between 1.01 and 1.05) to reach the theoretical maximum value 8,24 . Cotton growth model based on cumulative reference crop evapotranspiration. The crop growth model based on GDD only accounts for the effect of temperature on crop growth, but cotton growth is influenced by other meteorological factors as well. Therefore, using GDD alone is insufficient to accurately model cotton growth. Instead, we used reference evapotranspiration (ET O ), which considers various meteorological conditions such as temperature, solar radiation, water vapor pressure, and wind speed. The Penman-Monteith method is a widely accepted method for calculating ET O , as it incorporates both physiological and meteorological parameters. We used the FAO-56 Penman-Monteith method, which is a standard and reliable approach used worldwide. To account for cumulative evapotranspiration over time, we calculated cumulative reference crop evapotranspiration (CET O ) using daily ET O data and a specific equation.
where, R n is the net radiation, MJ/m 2 /d; G is the soil heat flux, MJ/m 2 /d; γ is the psychrometric constant, kPa/°C; Δ represents the slope of the saturation vapor pressure versus temperature curve, kPa/°C; e s -e a is the vapor pressure deficit, kPa; T is the average atmospheric temperature, °C; and u 2 is the wind speed at approximately 2 m above ground level, m/s.
Thus, the logistic model with CET O as the independent variable is expressed as:  50,51 . We analyzed the relationship between H and CGDD and CET O , using partial samples from different regions, as shown in Figs. 2 and 3. The growth curve of H for both indicators showed a similar "S" shape, indicating slow growth in the early stage, rapid growth in the middle, and slow growth in the later period. This pattern suggests that high temperatures and sufficient light promote rapid growth in the middle stage, whereas low temperatures and reduced sunshine in the later period lead to slower growth. In southern Xinjiang, H peaked at approximately 1800 ℃ CGDD and 600 mm CET O , while in northern Xinjiang, H peaked at around 1600 ℃ CGDD and 700 mm CET O . The difference in CGDD between southern and northern Xinjiang was mainly due to temperature, with low precipitation and high temperatures in the former and vice versa in the latter. In contrast, CET O reflects various meteorological conditions, and thus the difference in CET O between the two regions was smaller. This suggests that CET O is a more comprehensive indicator for regulating cotton H, as it considers multiple meteorological factors. The characteristics of H change were similar across all regions, indicating a consistent pattern of growth regulation. To normalize the H change characteristics, we established relative logistic models of cotton R H with R CGDD and R CETo , using data from different regions. The R H change curves, along with the fitting results, are presented in Fig. 4 (P < 0.05). These models provide a useful framework for analyzing the effect of different meteorological conditions on cotton growth, and can be used to identify the optimal conditions for maximizing R H and other growth indicators. The R H model were established as follows: Figure 4 shows the fitting results of R H of cotton based on CGDD and CET O in different regions, the R 2 ≥ 0.866, indicating that the model had a good fit and high degree of precision. Moreover, it can be seen that the fitting www.nature.com/scientificreports/ result where R CETo was the independent variable is better than with R CGDD , and the discrete level of R CETo was smaller than that of R CGDD . Those experimental data that were not included in the modeling were used to further validate the model (Fig. 5 (A and B); R 2 ≥ 0.891, RMSE ≤ 0.072, RE ≤ 0.970%). There was good agreement between the measured and calculated values of R H . These results clearly showed that the model with R CETo as the independent variable was more accurate than that with R CGDD .
Growth model characteristics of cotton leaf area index. As     www.nature.com/scientificreports/ are highly important and cotton growth requires sufficient temperature and light. When the CGDD was about 1400 ℃, LAI reached its maximum value in Xinjiang. When the CGDD was about 1600 ℃, LAI reached its maximum value in Central China and North China. However, when CET O was used to describe the LAI growth process, the CET O values when the LAI reached its maximum in different regions varied greatly. This was because meteorological conditions varied greatly among regions, and while CGDD is only sensitive to temperature, CET O integrates many meteorological factors, which creates a more realistic predictor of the growth process of LAI. This further showed that replacing CGDD with CET O will improve the accuracy of a crop growth model. It can be seen from Figs. 6 and 7 that there were differences in the LAI among different regions due to varying conditions, such as soil fertility, irrigation, and fertilization systems. Despite these differences, the change trends in LAI remained consistent throughout the growth period. The change characteristics of the LAI were analyzed and compared in different regions to determine the universal change characteristics. Thus, logistic models of cotton R LAI related to R GDD and R CETo were established, as shown in Fig. 8. The curve's rate of change was relatively high in the early stages of cotton growth, indicating that appropriate meteorological conditions can significantly enhance leaf area growth during this period. The R LAI was fit to the models as shown in Fig. 8 (P < 0.05). The R LAI model was established as follows: The based on CGDD and CET O model fitting and validation results are shown in Figs. 8 and 9, with R 2 ≥ 0.778, RMSE ≤ 0.133, and RE ≤ 3.627%. The fitting effect when using R CETo as the independent variable was better than when using R CGDD , and the discrete level of R CETo was smaller than R CGDD . If dR LAI dR CGDD = 0 , dR LAI dR CET O = 0 , we know that the R LAI maximum occurred when R CGDD and R CETo were 0.794 and 0.824, respectively. These results showed  Growth model characteristics of cotton dry matter accumulation. Crop yield is based on dry matter production, which is determined by nutrient absorption 52 . Dry matter accumulation is an essential indicator of cotton growth and is crucial for achieving high-quality and high-yielding cotton crops. The trends in the increase and decrease of cotton dry matter were similar among different regions. The rate of change of dry matter accumulation peaked when CGDD was between 700 and 1000 °C (Fig. 10) and CET O was between 200 and 500 mm (Fig. 11), during the budding stage, flowering stage, and boll development stage. During these stages, many leaves grew, and photosynthesis significantly increased due to sufficient temperature and light availability, leading to high cotton yields. Dry matter generally accumulated throughout the growing season, starting slowly during the seedling stage, gradually accelerating during the budding stage, peaking during the flowering to boll setting stages, and then stabilizing. However, due to the arid climate in Xinjiang, with high evaporation and little precipitation, dry matter accumulation peaked when CGDD was about 1900 °C and CET O was about 750 mm. The peak CGDD for dry matter accumulation was lower in Xinjiang than in Central China and North China, while the peak CET O was higher in Xinjiang than in Central China and North China. The change characteristics of the cotton D in separate regions were analyzed and compared to reveal the universal change characteristics of D. Thus, the logistic models of the change processes of cotton R D with R CGDD   www.nature.com/scientificreports/ and R CETo were established, as shown in Fig. 12. The fitting result is shown in Fig. 12 (P < 0.05), with R D calculated using the following formulas: The based on CGDD and CET O model fitting and validation results are shown in Figs. 12 and 13, with R 2 ≥ 0.674, RMSE ≤ 0.138, and RE ≤ 4.069%. Similar to H and LAI, the fitting effect when using R CETo as the independent variable was better than that when using R CGDD , and the discrete level of the measured values was smaller than that of R CGDD . Therefore, by using R CETo as the independent variable, a cotton growth index model with more accurate fitting results can be established. Such a model reflects the cotton growth process more precisely.

Growth model comparison.
It can be seen from Fig. 14 that with the changes in R CGDD and R CETo , the rate of change in R LAI was greater than in R H and R D in the early stage of cotton growth. Conversely, the rate of change  www.nature.com/scientificreports/ in R LAI was smaller than in R H and R D in the later stage of cotton growth. This proves that suitable meteorological conditions significantly impact the form of D in the later stages of cotton growth, during which the energy absorbed is mainly used for the growth of the cotton bolls. This growth pattern is similar to that of other crops, such as potatoes 23 and rice 9 . The rate of change in the CET O -based model was greater than in the CGDD-based model in the middle and later stages of cotton growth. This could be attributed to the fact that cotton growth is sensitive to various meteorological conditions, such as solar radiation, temperature, humidity, etc., which are included in the CET O calculation process. This further demonstrates that CET O is preferable for describing the cotton growth process. In addition, Eqs. (13)- (18) can be used to find the second derivative and obtain the change process of the first derivative function of the logistic model with R CGDD and R CETo . The figure shows that the change rate and inflection points based on the CGDD and CET O logistic models were different. Therefore, although the CGDD calculation is simple and the CET O calculation is complex, CET O should be used when modeling to ensure that the model reflects the impacts of meteorological conditions on cotton growth, which CGDD is unable to do.
Cotton boll growth relies on the contributions from all parts of the plant, as reflected by the growth indexes of H, LAI, and D. To comprehensively examine the effects of meteorological conditions on cotton growth and compare with traditional methods, we analyzed the impacts of CGDD and CET O on cotton H, LAI, and D. The results indicate that using CGDD alone to describe cotton growth yields a relatively simple relationship that does not fully reflect the impacts of meteorological conditions. Due to the spatial heterogeneity caused by meteorology conditions, cotton growth varies across regions, leading to spatial differences in yield. Therefore, it is crucial to consider meteorology conditions when describing the cotton growth process. Although differences in cotton H, LAI, and D, as well as irrigation water, directly affect yield, leaves are the main body for crop photosynthesis and transpiration. Therefore, further quantitative studies on the relationship between cotton LAI and yield, as well as irrigation water, are necessary.

Relationship of maximum leaf area index with cotton yield and irrigation amount. Cotton
yield is determined by multiple factors, including LAI and irrigation water amount (W). Previous studies have reported that increasing leaf area contributes little towards yield, because dark respiration rate is enhanced correspondingly after leaf expansion, which was not conducive to biomass accumulation 53 . A suitable LAI or an suitable increase in crop LAI is required to achieve higher yields. In particular, higher yields are related to the sustained photosynthetic activity of leaves. Therefore, it is important to establish the relationship of LAI max with Y and W to understand cotton yields (Fig. 15). The LAI max range was divided into seven categories (2-3, 3-4, 4-5, 5-6, 6-7, 7-8, and 8-9), and the W range was divided into six categories (400-450, 450-500, 500-550, 559-600, 600-650, and 650-700 mm). The average LAI max and W were calculated for each category.
The relationship of LAI max with Y and W can be described by the quadratic polynomial functions:  www.nature.com/scientificreports/ The R 2 of the fitted curves of LAI max with Y and W with LAI max were determined to be 0.888 and 0.841, respectively. The first-order derivative of Eq. (19) was 0, while Y was maximized (7171.7 kg/ha) for LAI max values of 6.043 cm 2 /cm 2 . Similarly, the first-order derivative of Eq. (20) was 0 with a LAI max of 4.938 cm 2 /cm 2 , which corresponded to a W of 573.541 mm. The results suggested that if LAI max values were close to 6.043 cm 2 /cm 2 , the cotton yield would be higher.    (21). Therefore, the IWUE can be calculated from W. Figure 17 presents the validation results of the measured and calculated values. The R 2 , RMSE, and RE were determined to be 0.701, 0.972 kg/(ha·mm), and 0.299%, respectively.

Discussion
This study demonstrates the important roles that CGDD and CET O play in simulating crop growth, and highlights the need to investigate the relationship between them. Su 28 conducted an analysis of CGDD trends and the correlation between CGDD and CET O in the Turpan area of China. Their findings indicate a strong correlation between R CGDD and R CETo , and suggest that a cubic polynomial function (Eq. (23), Fig. 18 (A)) can accurately describe the relationship between grape budding and maturity.  www.nature.com/scientificreports/ The logistic growth model can be expressed as follows: where g, h, k, and l are model fitting parameters; y 1 is cotton H or D; and y 2 is cotton LAI. However, the cubic polynomial function has many parameters and complex forms, making it difficult to calculate when performing the equation transformation. In this study, we found that the relationship between R CGDD and R CETo of different regions can be approximated using a linear function (Eq. (25), Fig. 18B). The linear function has fewer parameters, so it is simpler in form and easier to calculate. The linear relationship function between R CGDD and R CETo uses R CGDD as the independent variable as follows: where m and n are model fitting parameters.
Combining Eqs.  f 2 and a 1 , a 2 , a 3 , b 1 , b 2 , b 3 , and c 2 can be expressed as: Likewise: We used the equations to simulate the relationship between R CGDD and R CETo in each region and presented the results in Fig. 19. The fitting effect had an R 2 ≥ 0.976, and we obtained the values of parameters m and n for each region. Table 2 shows the validation of parameters d 1 , d 2 , d 3 , e 1 , e 2 , e 3 , and f 2 , which were calculated using Eqs. (26)- (34) to fit different regions. The validation of the model parameters H and D showed a good fit with RE < 0.10. However, the LAI model parameter validation showed unsatisfactory results. We used the modified logistic model to simulate the LAI change process, but this model was not well-suited for the task due to the quadratic polynomial exponential term exp, which increased the possibility of error during equation transformation. Nonetheless, it is possible to approximately calculate parameters d, e, and f from parameters a, b, and c for convenient calculations or situations where meteorological data are lacking to simulate the crop growth process.
Climate suitability is an important ecological characteristic for crops and forms the basis of the distribution of crop production 54 . Cotton is sensitive to climate variability and is unable to resist damage caused by drought and cold 55 . Numerous studies have reported quantitative predictions of the impact of future climate change on cotton production [56][57][58][59][60] . These studies generally predict major changes in the trends of cotton production under future climate scenarios, and describe how cotton agricultural production can respond and adapt to climate change. However, discrepancies among research methods (such as crop models, climate models, and climate scenarios) create uncertainties regarding the extent and severity of the impacts that future climate change will have www.nature.com/scientificreports/ on cotton yields. Therefore, future research should focus on quantitatively exploring the relationship between climate change and cotton growth process. Many mathematical models have been used to describe crop growth processes, such as the Logistic, Richards, Gompertz, Hoerl, and others. However, previous research has shown that the Logistic model has a better fitting effect when using time or CGDD as independent variables 11,12 . While many studies have quantitatively analyzed the impact of CGDD on crop growth, including for winter wheat 8 , rice 9 , cotton 24 , potato 23 , maize 61 , and watermelon 26 . It is important to note that these models have some limitations, as CGDD considers only a few meteorological factors, such as the highest daily temperature and the lowest daily temperature. However, meteorology factors such as temperature, wind, rainfall, relative humidity, and sunshine duration significantly affect the production of cotton flowers and bolls 20 . For instance, temperature is the most critical meteorology factor that affects cotton yield, and heat stress can lead to reduced fruit retention, delayed crop maturity, and lower lint quality. Strong winds may also cause boll shedding, thereby reducing yield. Additionally, continuous rainfall during flowering and boll opening can impair pollination and potentially reduce fiber quality 31 . Moreover, climate change is a crucial issue worldwide, and its negative effects on cotton growth and development can result from an increased number and severity of days with very high temperatures during the cotton season. These events can lower cotton yields by decreasing daily photosynthesis and, at times, raising respiration at Figure 19. Relationships between relative cumulative growing degree days and relative cumulative reference crop evapotranspiration in every region.  20,31,56,57 . Further research is necessary to comprehensively consider these factors when simulating cotton growth. In this study, we present a new cotton growth model that try to overcomes some of these limitations and demonstrate its superiority over existing models through comparison and analysis. Specifically, we use the CET O , which comprehensively considers meteorology conditions such as local solar radiation, humidity, temperature, and wind speed. Our results showed that the CET O -based logistic model more accurately described the cotton growth process compared with the CGDD-based model. We deeply evaluated both CET O and CGDD as effective predictors when simulating crop growth processes. In future studies, we plan to make further improvements to this method and expand our research to other crops. These research results and the constructed mathematical models can provide a scientific basis for improving agricultural production efficiency while also contributing to the improvement and development of crop growth models.

Conclusion
To provide a more accurate description of the effects of climate change on cotton growth, this paper proposes a novel logistic cotton growth model that uses CET O instead of CGDD as the independent variable. Although crop growth and yields varied significantly across different regions and years, the overall trends in their developments throughout the year remained consistent. A comparison between models using CET O and CGDD as independent variables tested the precision of the former in simulating crop growth. The results showed that CET O can comprehensively reflect the impacts of meteorological factors on cotton growth and more accurately describe the cotton growth process. Additionally, this study established mathematical models relating cotton LAI max to Y and W. When cotton Y reached its maximum value of 7171.7 kg/ha, the value of LAI max was 6.043 cm 2 /cm 2 . The required W was 518.793 mm, and the IWUE was 21.153 kg/(ha·mm). In the future, more attention should be paid to the impacts of meteorological conditions and irrigation on crop growth.

Data availability
The datasets generated and/or analysed during the current study are available from the corresponding author on reasonable request.